Algorithms for maximum-likelihood bandwidth selection in kernel density estimators

نویسندگان

  • José M. Leiva-Murillo
  • Antonio Artés-Rodríguez
چکیده

In machine learning and statistics, kernel density estimators are rarely used on multivariate data due to the difficulty of finding an appropriate kernel bandwidth to overcome overfitting. However, the recent advances on information-theoretic learning have revived the interest on these models. With this motivation, in this paper we revisit the classical statistical problem of data-driven bandwidth selection by crossvalidation maximum likelihood for Gaussian kernels. We find a solution to the optimization problem under both the spherical and the general case where a full covariance matrix is considered for the kernel. The fixed-point algorithms proposed in this paper obtain the maximum likelihood bandwidth in few iterations, without performing an exhaustive bandwidth search, which is unfeasible in the multivariate case. The convergence of the methods proposed is proved. A set of classification experiments are performed to prove the usefulness of the obtained models in pattern recognition. ! 2012 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discrimination of time series based on kernel method

Classical methods in discrimination such as linear and quadratic do not have good efficiency in the case of nongaussian or nonlinear time series data. In nonparametric kernel discrimination in which the kernel estimators of likelihood functions are used instead of their real values has been shown to have good performance. The misclassification rate of kernel discrimination is usually less than ...

متن کامل

Asymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data

Kernel density estimators are the basic tools for density estimation in non-parametric statistics.  The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in  which  the  bandwidth  is varied depending on the location of the sample points. In this paper‎, we  initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...

متن کامل

Uniform Convergence of Weighted Sums of Non- and Semi-parametric Residuals for Estimation and Testing∗

A new uniform expansion is introduced for sums of weighted kernel-based regression residuals from nonparametric or semiparametric models. This result is useful for deriving asymptotic properties of semiparametric estimators and test statistics with data-dependent bandwidth, random trimming, and estimated weights. An extension allows for generated regressors, without requiring the calculation of...

متن کامل

Asymptotic optimality of likelihood-based cross-validation.

Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we estab...

متن کامل

Maximum likelihood kernel density estimation: On the potential of convolution sieves

Methods for improving the basic kernel density estimator include variable locations, variable bandwidths and variable weights. Typically these methods are implemented separately and via pilot estimation of variation functions derived from asymptotic considerations. The starting point here is a simple maximum likelihood procedure which allows (in its greatest generality) variation of all these q...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2012